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Abstract 

We demonstrate how large classes of discrete and continuous statistical distribu- 
tions can be incorporated into coherent states, using the concept of a reproducing 
kernel Hilbert space. Each family of coherent states is shown to contain, in a sort 
of duality, which resembles an analogous duality in Bayesian statistics, a discrete 
probability distribution and a discretely parametrized family of continuous distri- 
butions. It turns out that nonlinear coherent states, of the type widely studied in 
quantum optics, are a particularly useful class of coherent states from this point of 
view, in that they contain many of the standard statistical distributions. We also 
look at vector coherent states and multidimensional coherent states as carriers of 
mixtures of probability distributions and joint probability distributions. 
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I Introduction 



In a series of recent papers, [TBI El EEE], an intimate connection between certain fami- 
lies of coherent states and statistical distributions has been demonstrated and studied. 
The coherent states discussed in these papers all have group theoretical origins and 
the Haar measure on the group has then been shown to induce a prior measure on the 
statistical parameters entering the definition of the discrete distributions. In this pa- 
per we look at a broader class of coherent states, which do not necessarily have their 
origins in group representations. In particular we show how, under certain technical 
restrictions, we can start with a discrete probability distribution, depending on a single 
real parameter, and associate coherent states to it. In the process we obtain a natural 
family of discretely indexed continuous distributions, which are then in a sort of duality 
with the original discrete distribution, via the coherent states. This duality is highly 
reminiscent of a similar duality observed in the theory of Bayesian statistics, since the 
resolution of the identity condition, which we impose on the coherent states, introduces 
a preferred prior measure on the parameter space of the discrete distribution, with this 
distribution itself playing the role of the likelihood function. The associated discretely 
indexed continuous distributions become the related conditional posterior distributions. 
Alternatively, one can also start with a discretely parametrized family of continuous 
distributions, and under a certain convergence assumption, once more build coherent 
states. These coherent states then again give rise to a dual discrete distribution or like- 
lihood function. We illustrate the theory by looking at a few examples of well-known 
statistical distributions (additional examples may be found in [UJ). Although most of 
these examples have been studied earlier, in the context of Glauber-Klauder-Sudarshan 
or Gilmore-Perelomov coherent states [TBI [TTl [T8] . we analyze them here from the present 
perspective, i.e., without invoking any group property. 

We take the discussion further by studying the relevance of vector coherent states 
and multidimensional coherent states when mixtures of probability distributions or joint 
distributions are considered. As far as we are aware, this is the first time that such vector 
coherent states have been studied in connection with statistical distributions. 



3 



II Experimental model context 



In the following paragraphs, using simple experimental setups, we try to motivate the 
simultaneous appearance of a family of discrete probability distributions and a family 
of continuous distributions in the sort of duality referred to earlier. First we describe a 
classical statistical procedure known as Bayesian inference. Then, as indicated above, 
we will consider a relationship between our subsequent mathematical analysis and this 
classical procedure. (See Appendix.) 

II. 1 Discrete Case 

Suppose we have an experimental setup for which we have an "experimental model" 
in the form of a family of discrete probability distributions n \— > P(n, A) relating to a 
discrete set of possible experimental outcomes. That is, we do not know the preparation 
exactly, only to the extent of a family of states, indexed, say by the parameter A which 
takes (continuous) values in some parameter space. The parameter usually represents a 
quantitative property of interest. In fact, the whole idea of the experiment, presumably, 
is to obtain data with which to estimate this physical property represented by the pa- 
rameter. As an elementary example, let us think in terms of setting up an experiment 
to toss a coin N times and count the total number, k, of heads. Now perform the 
experiment and designate the observed value of k as k Q ^ s . Then use /c Q bs to estimate 
the bias of the coin. The statistical model would be a family of binomial distributions 
indexed by a parameter p with "true" but unknown parameter value p$. One can esti- 
mate the value of po as p st = k Q \, s /N . But conditionally upon the observed value, /c Q bs, 
one may consider p as a random variable and construct a certain conditional probability 
distribution over the parameter space which we now treat as a measurable space. The 
motivation for this inference procedure is that, for example, one could then find subsets 
of the parameter space for which one could make statements such as "given the result 
of the experiment, there is a 99% chance that the true value po lies within that subset". 
(Think of an experiment where one tossed a coin 1000 times and got 999 heads.) 
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II.2 The duality 



In the Bayesian context, both the quantity to be observed and the unknown parameter 
are considered to be random quantities, playing a dual role. We consider two conditional 
probability distributions. Before performing the random experiment, the experimental 
model in the form of a family P(y, A) of discrete probability distributions is viewed as 
a conditional distribution of the random variable Y given the parameter value, say A. 
After performing the experiment, we have an observed value, say y bs, and we compute 
the conditional probability density function of the parameter A given y Q b s , obtaining 
a posterior conditional probability distribution. But, of course, we need to choose a 
prior measure P(dX). Suppose we have a probability density function where P(dX) = 
11(A) dX.The posterior probability density function is then given by [HI E7] (see also the 
Appendix at the end): 



A prototype classical example of the binomial distribution is the coin tossing ex- 
periment mentioned above and given in the Appendix. In that classical context, the 
posterior conditional probability density function for the parameter p would be obtained 
according to (12.11) . 

An example of a Bayesian approach involving the binomial distribution in a quantum 
context is given in [27] . A thought experiment is described involving a count of photons 
which are passed though a polarizer, a pinhole, and a calcite crystal, eventually triggering 
a detector as (+) or (— ). In that context, a posterior distribution is obtained via ( 12.11) 
for the binomial parameter 9, the direction of the polarizer. 

In [27j, the family of probability distributions which we have called the stochastic 
model for the experiment is designated as predictive. The conditional probability distri- 
bution for the parameter that we have called Bayesian posterior is there designated as 
retrodictive. 
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Ill A general setting for statistical distributions and 
coherent states 



Let {X, /i} be a measure space. X could, for example, be the space of some statistical 
parameters or a larger space containing such parameters. Consider the Hilbert space 
f) = L 2 (X,fi) and suppose that it contains a reproducing kernel subspace fix- This 
means that for any orthonormal basis, {§k}k=o °f (where N could be finite or 
infinite) the following is true: 

1. J2k=o \®k(x)\ 2 < oo, for almost all x G X and in fact, it is possible to define the 
functions $fe(x) in a way so that this convergence condition holds everywhere. 

2. The function 

N 

K(x,y) = J2Mx)My) (3.1) 

k=0 

defines a reproducing kernel, i.e., K(x,y) satisfies the properties, 



K(x,y) = K(y,x) , K(x,x)>0, for all x G X ; 

/ K(x,z)K(z,y) dfi(z) = K(x,y), for all x, y G X . (3.2) 
Jx 

It turns out that the kernel is independent of the orthonormal basis chosen to 
represent it. 

For such a Hilbert space Sjk, we can define a set of vectors, \x), labelled by the 
points of X in the manner: 

N N 



\x)=J\f(x) ^K{. , x) = J\f(x) 2^$ fc (x)$ fe , J\f(x) = K(x,x) = |$ 



k{x)\ 2 



k=0 k=0 



(3.3) 

The normalization factor J\f(x) is chosen in order to ensure that (x \ x) = 1. In view of 
(13.21) . these vectors are then immediately seen to satisfy the resloution of the identity. 

\x){x\ N{x) dfi(x) = Ifi K , (3.4) 
6 



x 



This condition implies that the vectors \x) form an overcomplete set in fix, so that any 
vector in it can be written as a linear combination, either as a sum of or an integral over 
these. Very often such a set of vectors is associated to a unitary representation of some 
group, and are constructed by letting the representation operators act on a fixed vector 
in Sjk- At other times such vectors are obtained by exploiting analytic properties of 
vectors in S)k- But at this point, we prefer to adopt a more general point of view and to 
just focus on the reproducing kernel Hilbert space structure. We shall call the vectors 
\x) (generalized) coherent states, (see, for example [I], for a detailed discussion). 

It is possible to associate two types of probability distributions to the basis vectors 
in a reproducing kernel Hilbert space. First, writing 

1$ (x)\ 2 

P(n,x)= '^.y , n = 0,l,2,...,iV, (3.5) 

we see that Yln=o P{ n i x ) = 1- Thus, P(n, x) can be looked upon as a discrete probability 
distribution with parameter x. For instance, it can be based upon some experimental 
setup and then might be viewed as a stochastic model. Secondly, if X C M m , and if dfi 
has a Radon-Nikodym density with respect to the Lebesgue measure dx (on M m ), then 
the functions, 

9 n (x) = |$ n (x)| 2 ^ = P(n, x) A/>)^, n = 0, 1, 2, . . . , N , (3.6) 

define, for each n a continuous probability density on X, since j x ^/ n (x) dx = 1. In 
the context of Bayesian statistics, this could be thought of as a conditional probability 
density for x, given n. If P(n, x) is a statistical distribution, corresponding to some 
physical situation, which depends on the parameter x, the measure 

dK(x) = Af(x) dfi(x) (3.7) 

can be interpreted as a prior measure on the parameter space X and then the ^/ n (x) 
become the associated posterior distributions, in conformity with (12. ip . In [TBI HZJ [TSJ , a 
group theoretical argument, exploiting the invariant measure and coherent states related 
to a particular representation of the group on a Hilbert space, were invoked to obtain 
the prior measure. Here we see that the appearance of a discrete probability distribution 
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P(n,x) and the continuous probability distributions ^ n {x) in this dual relationship is 
embodied in the structure of the coherent states \x), independently of any group action. 



III.l A generic example 

As a particular example, of the above situation, which will be useful for the purposes 
of the present paper, and which will turn out to have rich applications to statistical 
distributions encountered in extensive physical contexts, we introduce a family of the 
so-called non-linear coherent states. These are built by taking an abstract, complex, 
separable Hilbert space f), of dimension N (finite or infinte), choosing an orthonomal 
basis 0fc , k — 0, 1, 2, . . . , N, of it and defining on it the vectors 

N u 

Z 



fc=0 



where z is a parameter drawn from some appropriate open subset of C and x\, X2, X3, ■ ■ ■ , 
is a conveniently chosen positive sequence of numbers for which we define the generalized 
factorial, = X1X2 ■ ■ ■ Xk, with xq\ = 1, by definition. The normalization factor in this 

N \ z \ 2k 

case is Af(\z\ 2 ) = ^2 k=0 — r- and of course, (z\z) = 1. In order to ensure that these 

x k . 

coherent states form an overcomplete set of vectors in the Hilbert space Sj, one requires 
the resolution of the identity, 

\z)(z\ Af(\z\ 2 ) du(z,z) = Isi , (3.9) 

V 

to hold, where 1^ is the identity operator on the Hilbert space fj and T> is an appropriate 
domain of the complex plane (usually the open unit disc or an open annulus, but which 
could also be the entire plane). It is not hard to see that the resolution of the identity 
( 13.91) will hold if the measure du, which is usually of the type dg(r) d9 (for z = re lG ), 
is such that dg is related to the x^\ through the following moment condition (see, for 
example, [29] for a discussion of the moment problem): 



2tt 



dg(r) , k= 0,1,2,..., (3.10) 



S 



L being the radius of convergence of the series Ylk=o — T (considered as a series in A = 

Xf.. 

\z\ 2 ). This means that once the sequence x\, x 2 , x 3 , . . . , is specified, the measure dg is to 
be determined by solving the moment problem (I3.10p . There is an extensive literature 
on the construction of coherent states of this type (see, for example, [TOl |22"1 1231 125]). 
On the other hand, if the moment problem has no solution or, it has a solution but the 
corresponding measure is not explicitly known, there exists an alternative constructive 
procedure which allows one to build non-linear coherent states, again resolving the 
identity [5]. 

We proceed now to analyze the discrete and continuous probability distributions, in 
the sense of the previous section, associated to these coherent states. 



III. 2 Discrete distribution associated to \z) 

With A = |z| 2 , define the discrete probability distribution P(n, A), n — 0, 1, 2, . . . , N, 
by 

P(n,X) = —Af(\)- 1 . (3.11) 
x n . 

The normalization condition (z \ z) — 1 is seen to imply that 

N 



^P(n,A) = l. (3.12) 



n=0 

In the special case, where x n = n, this distribution is just the well-known Poisson 
distribution, for then x n \ = n!, iV(A) = e A and L = oo. We shall see later that many of 
the well-known discrete statistical distributions are related to nonlinear coherent states 
in this manner. Note that if Y denotes the discrete random variable, Y(n) = x n , then 
taking xo = 0, we obtain its expectation value, 

N 

(F) = ^x n P(n,A) = A. (3.13) 

n=0 

Thus for each A we get a discrete probability distribution, which is some sort of a 
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generalized Poisson distribution. In general, the sort of distributions given by (13.111) are 
of the power series type, well-known in statistics (see, for example |21j). 



III. 3 Continuous distributions associated to \z 

We next note that in view of (13.101) . 

2tt f P(n,X)Af(X)dg(X) = l, n = 0, 1, 2, . . . , N, 
Jo 

where we have written 

dg(X) = dg{r), r 2 = X . (3.14) 

Thus, the functions, 

tf B (A) = 2nP(n, A) Af(X) ^ = 2tt^- n = 0, 1, 2, . . . , (3.15) 

dX x„\ dX 



define, for each n, a continuous probability density over the parameter space < A < L. 

Here, ^L^£ denotes the Radon-Nikodym derivative of the measure dg with respect to 
dX 

the Lebesgue measure dX, provided it exists. Clearly, 

/ *„(A)dA = l, n = 0,l,2,.... (3.16) 
Jo 

From (I3.12p it follows that 



^n(X) = 2irU{X) ^ < oo , (3.17) 

n=0 

for almost all A e [0, L}. Also, if A is the continuous random variable over the parameter 
space [0,L], such that A(A) = A, then 

(A) n = / A^ n (A) dX = x n+ \ , (3.18) 



o 



which is a dual relation to (13. 13f) . 
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Finally note, that in terms of the discrete and continuous probability distributions 
themselves, the coherent states (13.81) may be written as 

N 

\z) = £[P(n,A)]*e- 



-ind 

\r yu, /\)\ - e (, 
n=0 



2^(A) d ~ 0W 



dX 



2 



N 



i^nW? e~ me <p n , z = VXe~ ie , (3.19) 

n=0 

and which satisfy the resolution of the identity, 

/ / \z)(z\Af(X)dg(X)de = 1^. (3.20) 
Jo Jo 

Comparing (12.11) and (13.151) we see that the measure 

dK(X) = 2tt7V(A) dg{X), (3.21) 

gives a prior measure on the parameter space [0, L}. Furthermore, these results give us a 
hint as to how one might construct coherent states starting from families of probability 
distributions. 

We emphasize again that the duality appearing here, between the family of discrete 
probability distributions, n i — > P(n, A), parametrized by A and the family of continuous 
distributions A i— > \I/„(A), parametrized by n, is analogous to the Bayesian duality, that 
we already referred to at the end of Section 111.21 between a discrete probabilistic model 
P(n, A) and the continuous probability density function, (see also the Appendix to this 
paper), and which is captured in the relation, 

where n represents an experimentally realized value of the discrete random variable and 
this conditional density function (Bayesian posterior density function) is obtained using 
the prior measure U(X)dX (see, for example, [6]). 

It is interesting to note that the coherent states \z), which are unit vectors in the 
Hilbert space Sj, may be thought of as being square roots of the discrete probability 
distribution function n i— > P(n, A), in the sense that || \z) || 2 = X]n=o-^'( n ' <\) — 1- 
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The probability distribution P(n, A) can be extracted from the coherent state \z) by 
taking the trace: 

P(n, A) = Tr[\z)(z\ P„] = \(<f> n \ z)\ 2 , (3.23) 

where P n = \<f) n ) (<p n \ . In a quantum mechanical interpretation, this P(n,X) is the 
probability of measuring the physical quantity encoded by the state <f) n when the system 
under observation had been prepared in the state \z). 



III. 4 Coherent states from discrete statistical distributions 

Suppose now that we start with a discrete probability distribution, P(n, A), where again 
n = 0, 1, 2, . . . , N, with iV being either finite or infinite and A is a parameter drawn from 
the interval [a,b] C [0, oo). Of course, ^^^Pin, X) = 1 anc ^ we further assume that 
P(n, A) satisfies the conditions: 

I. There exists a measure dn on [a, b], absolutely continuous with respect to the 
Lebesgue measure d\ and such that 

P(n, A) dn(\) := c n < oo, n = 0, 1, , 2, . . . , N . (3.24) 



2. For all A e [a, 6], 



N 



J2^^<oo. (3.25) 

n=0 

On the interval [a, 6], let us define the functions 

*n(A) = -P(n,A)^, (3.26) 
c n dX 

for which we note that 

/ * n (A)dA = l, n = 0,l,2,...,iV, (3.27) 

J a 

and using them we define on the open annulus, 

V = {z = VX e~ ie | a < A < b , < 9 < 2tt} C C , (3.28) 
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the functions 



1 



[*»(A)]*e 



e 



(3.29) 



V2tt 



Note that the range of values of the index n need not be constrained to lie among 
the nonnegative integers only. It could also be a subset of Z or all of it. 

It is worthwhile pointing out that the measure dn postulated in (I3.24p is not nec- 
essarily unique, which leaves the possibility of there being several such measures which 
could be acceptable. In the case of the discrete distributions arising from non-linear 
coherent states, the requirement of the resolution of the identity, i.e., the moment con- 
dition (13.1 Op fixes the measure dn. Also the functions (13.261) are exactly like the f(X,n) 
in (13.221) . appearing in the duality studied in Bayesian statistics [6l [2T] although, unlike 
in that case, we have here the additional restriction (13.2511 . 

Clearly, the functions {$ n }^ =0 form an orthonormal set: 



Jv 

Let denote the Hilbert subspace of L 2 (V,dX dff) generated by these functions. 
Since, 



by virtue of (13.251) . S) is a reproducing kernel Hilbert space. From the discussion at the 
beginning of this section (see (13.31) ). we can then define coherent states in fj as: 




(3.30) 




(3.31) 




(3.32) 



which now satisfy the resolution of the identity 




(3.33) 



Note that from (gjjEJ) and (ET25I) . we get 
_ m _ P(n, A)II(A) 



where, 11(A) 



d«(A) 
dX 



(3.34) 



jV(n, A)II(A) d\' 
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so that dn can be thought of (see ( 13. 22ft ) as a prior measure on the parameter space 
a < A < b and the \l/ n as the associated Bayesian posteriors. 

To make the connection with (13.31) and (13. 4p . we easily see that the coherent states 
(I3.32p can also be written as 

oo oo 
\Z) = M{\Z\ 2 )-^ YJ^)^ N(\Z\ 2 ) = \*n(z)\ 2 , (3-35) 

n=0 n=0 

and the resolution of the identity as 

-b p2tt 



1 

2^ 




\z){z\N(\)d\d6 = I Si , (3.36) 



a JO 



III. 5 Coherent states from continuous statistical distributions 

We now proceed to construct analogous families of coherent states from sets of contin- 
uous probability distributions. Suppose that \l/ n (A) , n — 0, 1, 2, . . . , N, is a set of 
continuous probability densities defined over the set Jcl. Evidently, they satisfy 



J^ n {\) d\ = l, 



n = 0,1,2, ...,iV . 



We assume in addition that 



A/-(A):=7^y> n (A)<oo, AG/, (3.37) 

n=0 

Then, as before we construct the set of functions on X = I x [0, 27r): 

$ n (A, 0) = -4=[*n(A)]s e -^, ra = 0, 1, 2, . . . N , (3.38) 

V 27T 



and note that they form an orthonormal set in L 2 (X,dX d9). Let f) be the Hilbert 
subspace of L 2 (X,dX d6) generated by these vectors. Then once again, following (13.31 ) 
we construct the coherent states in f): 

N 

\\6) = Af(A)-3 ^$„(A,0) $ n , (3.39) 

n=0 
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with A/"(A) as in (13.371) . These coherent states satisfy the resolution of the identity, 



2- 



|A,0)(A,0| Af(X) d\d6 = Isj , (3.40) 



Clearly, the discrete distribution function this time is 



P( n?A ) = i^, (3.41) 
Af(X) 



with A/"(A) dA the prior measure. 



IV Some illustrative examples 

In this section we construct coherent states for some standard statistical distributions, 
following the general procedure outlined above. These coherent states have been ob- 
tained before, using group theoretical arguments [161 El HH] an d we shall indicate, in 
each case, the group theoretic relevance of the coherent states. Moreover, in each case 
the interplay between the dual system of discrete and continuous distributions, embodied 
in the coherent states will be explicitly demonstrated. 

IV. 1 Coherent states from the Poisson distribution 

For the Poisson distribution, the probability of n successes, given that the average 
number of successes is A > 0, is 

P(n, A) = ^— j— and Yl P ( n ' A ) = 1 ■ ( 41 ) 

n ' n=0 

Once again we would like to relate these to a family of coherent states. Also, thinking of 
A itself as a random variable, we would like to obtain a distribution function for it. We 
start by introducing the complex variable, z = VX e~ ld , and since J °° P(n, A) dX = 1 
for all n, we define the functions (see (13. 29ft ) 



$ n (z) = -L [p(n,A)]§e 



-inO _ 1 



V\n e -\1 



n! 



\~ in \ n = 0,l,2,...,oo . (4.2) 
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These functions are clearly orthonormal with respect to the measure dX d6: 

"OO p2ir 

S ~ <Tl ( -v\ rl\ rlQ — ,, 

'van 



poo p2-K 

\ j $ m (z) $„(z) d\ d6 = 5 n 
Jo Jo 



Let Sj C L 2 (C,dX d6) be the (infinite dimensional separable) Hilbert space generated 
by them. Next we see that conditions (13. 24ft and ( 13.251) are satisfied with dn = dX and 
c n = 1 for all n. Thus, following (13.321) we may define coherent states on as 



\z) = \/^M e~ md 3> n = e-4- -r = } $ n, (4.3) 

n=0 n=0 

so that, 

oo 

(z\z) = Y,P(n,X) = l- 

n=0 

Again, the coherent states \z), may be thought of as being square roots of the discrete 
Poisson distribution, n \— > P(n, A). 

These coherent states also satisfy a resolution of the identity: 

-i poo p2lT 

— / \z)(z\dXd9 = Isi. (4.4) 
2tt J Jo 

It is clear that this time the prior measure on the parameter space < A < oo is 
just the uniform distribution dX, with the Bayesian posteriors being given by \P n (A) = 
P(n, A). The coherent states (14.31) are the canonical coherent states, well known in the 
physical literature (see, e.g., [1]). Moreover, these coherent states are associated to a 
unitary representation of the Weyl-Heisenberg group and the prior measure dX is also 
obtainable from the Haar measure of this group [18J. 



Finally, it ought to be pointed out that the continuous distribution given by the 

function \& n (A) = P(n, A) is just a 7-distribution, for each n. In other words, the discrete 

Poisson distribution and the continuous 7-distributions (which may now be thought of 

as being conditional distributions for the average number of success A, given n successes) 

are in duality through the canonical coherent states. Moreover, had we started with the 

A n_1 e" A 

7-distribution functions, 7 n (A) = — , defined \l/ n = 7 n+ i , n — 0, 1, 2, . . . , 00 and 

r in) 



1(3 



followed through the steps in Section IIII.51 we would have arrived at the same coherent 
states (14.31) . In the field of statistics, the gamma distribution is said to be a natural 
conjugate to the Poisson sampling process [TJ. 



IV. 2 Coherent states from the binomial distribution 

Consider the binomial distribution for N independent trials, each having a probability 
of success p and of failure q = 1 — p. The probability of getting n successes in these N 
trials is 

P(n, p) = ( N ) p n q N ~ n = N \ p n q N ~ n , n = 0, 1, 2, . . . , N , (4.5) 
\nj (iv — n)\n\ 



and of course, 



A? 



Y J P(n,p) = (q + p) N = 1. 

n=0 

As before, we treat the parameter p itself also as a random variable and then use our 
general construction in order to: (1) obtain coherent states representing this distribution 
and (2) find a posterior distribution for p. This case has also been worked out in |17j . 
using coherent states of the rotation group and we shall indicate the connection to this 
approach in the sequel. Let us first introduce a new parameter A, which will be more 
convenient for our purposes: 

A = - =^ q= — — and < A < oo. (4.6) 
q 1 + A 

Using this we introduce the complex variable z = \f\e~ %e and note that in terms of A, 
the probability distribution ( 14.51) can be rewritten as 

P(nX) = ^ ^— = r ^ + 1 ) ^ (47) 

l ' J (N-n)\n\ (1 + X) N F(N -n + l)T(n + l) (1 + \z\ 2 ) N { ' J 

Since 

Xn ^ . ^ f\N-nn „^,„ n\(N-n)\ 



(1 + x)N+2 d\ = (N+ 1) ^ q N - n (l - q) n alq 



N\ 
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we take (see (EM) ) 



dn = -7 — -^tttt dA 



1 . 



Since iV is finite, 



(1 + A) 2 

is trivially satisfied. Thus, we take 

d«(A) (iV+1)! A n 



# n (A) = P(n,A)- 



dA (iV-n)!n! (1 + A)^+ 2 



(4.9) 



and 



(JV+1)! 1 5 z r 



_27r(iV-ra)!ra!j (1 + |z|2)t 
Clearly, these vectors are orthonormal: 

"OO /"27T 



zeC, n = 0,l,2,...,iV. (4.10) 



o jo 



® m (z)$ n (z) dA d6 = 6 n 



and we denote by Sj the (N + l)-dimensional Hilbert space generated by these vectors. 
On this space we then have the coherent states, 



\z) = y^Me-^ 



Note again, that since 



1 N 



r(iV + 1) z n 



(1 + M 2 )f ^ ^r(7V-r2 + l)r(n + l) 



(4.11) 



N 



{z\z) = l = Y,P{nA) 



n=0 

for each A = \z\ 2 , the coherent state \z) is sort of a vectorial square root of the probability 
distribution P(n, A), n — 0, 1, 2, . . . , N. These coherent states satisfy the resolution of 
the identity, 



1 

2^ 



^0 



\z)(z\ d«(A) d# 



2vr 



«/0 



dA dfl 

(TTa) 



2 _ ^ 



(4.12) 



Next, introducing the new labels N = 2j, k = n — j, we write 

3 



/ r(2j + 1) 
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(4.13) 



which are immediately recognized as being the Gilmore-Perelomov-Radcliffe type coher- 
ent states [U EEJ ES H] for the (2j + l)-representation of SU(2). Indeed, the vectors \z) 
may be rewritten in terms of the SU{2) generators J±, J 3 and the lowest basis vector 
as: 

\ z ) = e zJ+ e^ 3 e-* J -$-3 = e^+'^-^j := £>(0$-j , (4-14) 
where, writing z = — tan | e - * 7 , 

£ = i— e 11 and 77 = log(l + \z\ 2 ) = 2 log sec — . 

Finally, note that by virtue of (14. 7p and (I4.9p . the measure 
iV + 1 

d/qA) = 7 — — d\ or equivalently, dn(p) = (N + 1) dp, (4-15) 

(1 + A) 2 

gives in this case the prior measure (again uniform) of the parameter p over the interval 
[0,1]. 

Once again, it is clear that had we started with the continuous distributions (]4.9p . 
which are /^-distributions of the first kind, and followed through with the procedure in 
Section Ull. 51 we would also have arrived at the coherent states (14.111) . Thus, the contin- 
uous /^-distributions of the first kind and the discrete binomial distribution (statistical 
conjugate pair) are in duality through the £77(2) coherent states. 



IV. 3 Coherent states from the negative binomial and 
/^-distributions 

The negative binomial and the /3-distributions have a dual relationship through the 
coherent states arising from the discrete series representations of the SU (1,1) group. 
Recall that, for a fixed integer m > 1, the negative binomial distribution is given by, 

P(m, n; A) = t?, } A m (l - A) n , n = 0, 1, 2, . . . , 00 , (4.16) 

1 (n + 1)1 (m) 

where the parameter A lies in the interval (0, 1). The quantity P(m, n, A) can be thought 
of as being the probability that m + n is the number of independent trials that are 
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necessary to obtain the result of m successes (the (m + n)-th trial being a success) when 
A is the probability of success in a single trial. The term negative binomial stems from 
the fact that 

M \\-k _ V- T{k + n) 



o r(n + i)r(*) 



from which it also follows that 

oo 

53 P(m, n; A) = 1 . (4.17) 

n=0 

The /^-distribution is a continuous distribution, in the variable A £ [0, 1], with discrete 
parameters m,n — 1, 2, 3, . . . , oo, 



where, 



(3(X;m,n) = — rA m_1 (l - A) 71-1 , / /3(A; m, n) dX = 1 , (4.18) 

B(m,n) J 



We note that, 

P\7Tl ft' A) ?72 
ff(A; 771 +1,71+1) = ' ' With C m , n = - ■ — ? — , (4.19) 

c m , n (m + n + l)(m + n) 

implying, by virtue of ( 14.181) . 

•i 

P(m, n; A) c?A = c m>n and dn(X) = dX . (4.20) 

Thus, ( I3.24p is satisfied, with c n = c m ^ n and (I3.25P is also satisfied since, 

EP(m,n\X) m + 1 , , , m + 1 . . 

y = — x 2 ^ A ) = -rr- < 00 > ( 4 - 21 ) 

n=0 Cm ' n n=0 A 

by virtue of (fl~T7l) . 

Thus, for fixed m > 1 and n = 0, 1, 2, . . . , 00, we define, using ( 13.261) and (14.191) . the 
continuous distributions, 

^n(A) = d J^L = PiX;m+ l,n + l) = ) — A-(l-A)" , (4.22) 

c min dX B{m+l,n + l) 
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and the associated functions in the complex variable £ = v^A e tnd , < A < oo , < 
9 < 2vr, 



^m,n(C) 

which satisfy the orthonormality condition 



-inO 



POO pZTV 

/ / $m,n(C) $m,fc(C) d\ d9 = 5 nk . 

Jo Jo 



Denoting by Sj the (infinite dimensional separable) Hilbert space spanned by these 
vectors, and noting that by (I4.2ip . 



J\f(\) = ^ P ( m ' n ' A ) _ m + 1 

Cm. n. A 



n=0 



we define the coherent states associated to the discrete negative binomial and continuous 
/^-distributions, on this space using ( 13.32ft : 



ic;m> = m^Yl 

oo 

= £ 



P(m, n; A) 



n=0 



n=0 

r(m + n + 21 



T(m + 2)T(n + 1)_ 



A^ +1 (l - \pe 



These satisfy the resolution of the identity, 



m + 1 
2tt 



1 /■27T 




|C,m)(C,m 



d\ d9 



-ind 



(4.23) 



o jo 



X 2 



(4.24) 



while from (13.221) . (14.20p and (14.221) we obtain the prior measure on the parameter space 
[0,1]: 

d«(A) = d\ . (4.25) 



Note that this measure is different from the one obtained in [16], which was derived 
using a group theoretical argument. However, in the present case, m = 1, 2, 3, . . ., while 
in [16] the value m = 1 was excluded. The associated Bayesian posteriors this time are 
the , n = 0,1,2,..., oo . 
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Once again it is clear that if we start with the continuous /^-distributions ( 14. 181) . and 
construct coherent states following Section Hll.51 with \I/ n (A) = j3(\;m + l,n + 1), we 
arrive at these same coherent states. 

To make contact with the coherent states of the £77(1, 1) group let us introduce the 
new complex variable z = (1 — \(\ 2 )^e~ l9 = (1 — \)^e~ ie and write m + 2 = 2j. Then in 
terms of this variable we get the coherent states 

oo 
n=0 

These are the Gilmore-Perelomov type coherent states arising from the discrete series 
representations [U [T6], [26] of SU(1, 1). Since we are assuming that m > 1, the rep- 
resentation corresponding to j = 1 does not appear here. We observe that in the 
mathematical literature, these coherent states are usually written without the factor of 
(1 — l^] 2 )- 7 appearing before the sum on the right hand side of (14.26j) . This is because, 
unlike in our case, the Hilbert space for the discrete series representations of 577(1, 1) is 
taken to be the one consisting of all holomorphic functions on the open unit disc of C, 

which are square-integrable with respect to the measure — (1 — |z| 2 ) 2 ' J ~ 2 dx dy, 

where z = x + iy, and the factor is absorbed into the measure. 

Note finally, that all three examples discussed here lead to coherent states of the 
non-linear type (see (13.81) ). To summarize, we have seen that the canonical coherent 
states combine in duality the continuous 7-distributions with the Poisson distribution, 
the coherent states of the 577(2) group so combine the continuous /3-distributions of 
the first kind with the discrete binomial distribution and the coherent states obtained 
from the discrete series representations of the SU(1, 1) group combine in duality the 
continuous /^-distributions with the discrete negative binomial distribution. 



r(2j + n) 

r(2j)r(n + i) 



z n $ 



2j, n > 



2' '2 



(4.26) 
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V Vector and multidimensional coherent states 
from probability distributions 



So far we have considered only single discrete probability distributions and constructed 
coherent states from them. We now look at a situation where several independently 
distributed random variables are at play. It will turn out that the appropriate type of 
coherent states to associate to such situations are vector coherent states (VCS) of the 
type discussed in [3j[30] or multidimensional coherent states of the type studied in [24]. 

Let us take a discrete probability distribution P(n, A) , n — 0, 1, 2, . . . , N (finite or 
infinte). This is the probability distribution of the discrete random variable 91 such that 
9T(n) = n and assume that it is of the type (13.111) . i.e, the associated coherent states 
are of the non-linear type. Assume now that we have M such independent, random 
variables, distributed with parameters Ai, A2, . . . , Am, respectively, each drawn from the 
interval [0,L]. Then 

1 M 

P(Ai, A 2 , . . . , Am; n) = — ^ P(n, A,) (5.1) 

1=1 

is the probability of n "successes" coming from any one of these processes when we are 
indifferent to which one it comes from. We now ask if there is a natural set of coherent 
states that could incorporate such a system of distributions, along the lines of what we 
saw earlier. It will turn out that a Hilbert space over a matrix domain, consisting of 
normal matrices, will be appropriate for the construction of such coherent sates. Recall 
that a normal matrix 3 is defined by the condition 3*3 = 33* and if 3 is an M x M 
matrix, it can be diagonalized by means of a unitary matrix, i.e., 

3 = Udiag[z l ,z 2 ,...,z M ]U* (5.2) 

where, U G U(M) and the elements Zi, i — 1,2,3, ... , M, of the diagonal matrix are 
complex numbers. Writing Z{ = V% e~ l6i , let f2 denote the set of all such matrices for 
which < Aj < L , i = 1, 2, 3, . . . , M. We next define the matrix valued functions on 
the domain Q, 

$ n (3) = ^==, n = 0,l,2,...,iV, (5.3) 
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and on f2 we define the measure, 

M 



dfi(3, 3*) = duf[ dg(Xi) d6 t , f dfi(3, 3*) = 1 . (5.4) 

where <i?7 is the (normalized) invariant measure of U(M) and d~g is the measure intro- 
duced in f[330j) and (15311 . 

It then follows that the functions satisfy the matrix orthogonality condition: 

3 m 3* n e^(3,3*)=Iitf<U, (5.5) 



where 1^ is the M x M identity matrix. Let {x l }t=i be an orthonormal basis of C M 
and define the C M -valued functions, 

K(y) = <M3*)x* • (5.6) 

Note that 



M 

2 



Z; 



i=l 

Also, the series, 

oo N M N M , n 

E Tr [*«(3)* * B (3)i = EE ^(3) f ** o) = E E ( 5 - 7 ) 

n=0 n=0 i=l n=0 i=l 

converges for all A, G [0,L), which following the discussion at the beginning of Section 
Hilt is the condition for building reproducing kernel Hilbert spaces, which we now proceed 
to do. 

Consider the Hilbert space f) = L^ N (Q,dQ) of square-integrable, M-component 
vector-valued functions on Q. The vectors <& l k , i — 1,2, ... , M, k = 0, 1, 2, . . . , N are 
elements of this Hilbert space and in fact, by virtue of (15.5p . they form an orthonormal 
set in it: 

<*U*£>= / ^(3)^(3)^(3,3*) = ^^. 

Jn 
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Denote by $)k the Hilbert subspace of 9) generated by this set of vectors. Then, in view 
of the convergence of the series in (15. 7ft . 

O*i(3*)|| a <oo, V3*e^. 

i.k 

Thus, Sjk is a reproducing kernel Hilbert space of analytic functions in the variable 3*, 
with matrix valued kernel K : Q x Q i — ► C NxN , given by (see (13.11) ) 



m*', 3) = E ^(30^(3^ = E 



j,fc i,k 
•2*lk "ik 

- E^. (-) 

When M = l, 3 = 2, f2 = C and = fc!, we get the well-known Bargmann kernel, 

and S)k is the Hilbert space of entire analytic functions in the variable z. This is the 
kernel associated to the canonical coherent states (14.31) . 

The vector coherent states associated to the reproducing kernel K are (see (I3.3P ) the 
vectors |3; i) G Sj K , 

|3;*}(3*') =^(3*,3)-^(3*',3)x i , AA(3*,3) = K ^ y) (5.9) 

defined for each 3 £ ^ and f = 1,2,...,M. Note that since if (3*, 3) is a strictly 
positive-definite matrix: 

K (3*, 3) = £/ diag [jV(Ai), jV(A 2 ), . . . ,M(Xm)] U* , (5.10) 
A fc 

where for each i, A/"(A») = X]fc=o = ^7 * s the same normalization factor as in (13.81) . the 
negative square root makes sense. The vector coherent states ( 15. 91) satisfy the resolution 
of the identity (compare with ( I3.20p ). 

Al 



J2 [ |3;t><3;t|^(3*,3)dn(3,3*) = /jf , (5.11) 
<=i Jn 
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and the normalization condition: 



M 



£<3;<|3;t) = i 



(5.12) 



The kernel K has matrix elements 



But also, in view of (15.51) . 



(3';i|#(3*,3)|3;j> = ! ^K{y> ,3L)*K{T ,l) x j dSl{3L,T) 

Jn 



(5.13) 



Using (I5.8P the VCS can alternatively written as 



|3;f>(3*')=W ) 3)-'X) 



AA(3*,3)-^ 



3*V x Jt 3V 



so that, 



AT AT 



j=i k=0 



|3;.)=AT(3*,3)^X:X:^^T i 



(5.14) 



Let 9) be an iV dimensional (complex, separable) Hilbert space and let {4>k)k=o ^ e an 
orthonormal basis for it. Then the vectors x % ® (pk, 1 = 1,2,..., M, k = 0, 1, 2, . . . , N, 



form an orthonormal basis of C N <8> We make a unitary transformation, V : — ► 
<8> ij, by the basis change i — ► x* ® Under this map, the VCS |3; i) transform 
to the vectors 



M N 



|3, <)" : = ^l3,^)=AT(3*,3)-^£x J ® 



x jt 3V 



w,3)-*x; 



fc=0 



j=l fc=0 



(5.15) 
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which are exactly the VCS defined (over matrix domains) in [3]. Also, in this form the 
VCS resemble the non-linear coherent states (13.81) more closely. The inverse of the map 
V is then easily seen to be given by, 



N 



i=l 



(5.16) 



To return to the discussion of the probability distribution P(Ai, A2, . . . , Am; ti) in 
(15.11) . we first rewrite the VCS (15.151) explicitly in matrix form as: 



3,f>" = 

N 



k=0 



P(k, Ai) e~ ik91 

y/P(k,X 2 ) e~ ike2 








V 



••• y/P(k, A M ) e- ike ") 



Again, let P n = |0 n )(0 n | and define 



P(3,3» =Tr^ 



M 



U*x l ® 0fc- 
(5.17) 

(5.18) 



where Tr^ denotes a partial trace in Sj. Clearly, P(3, 3*; n) is an M x M matrix and it 
is not hard to see that 



(P{n,\ x ) 
P{n,X 2 ) 



V 










P(n, X M )J 



U* 



(5.19) 



Now taking the trace in C we immediately see that 

M 



Tr C M[P(3,3*;n)] = Tr C M 8fl 



-P(Ai, A 2 , . . • , Am; n) , 



(5.20) 
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which should be compared with ( 13.231) . Finally, the determinant 

det [MP(3, 3*; n)] = P(n, A 1 )P(n, A 2 ) . . . P(n, A M ) , (5.21) 

denotes the joint probability of getting n "successes" from each distribution. 

Before leaving this topic of matrix valued distributions, let us point out that more 
general situations than envisaged by (15. ip can also be treated using similar techniques. 

For example, instead of attaching the same weight, — , to each component P(n, Aj) of 
the mixture, we could also attach different weights /U, to them (with /ij > for all i and 
^fLi^i = !)• Examples of this type will be dealt with in a future publication, where 
we shall also allow the possibility of M being infinite. 

To treat general joint probabilities of the type, 

P(ni, Ai; n 2 , A 2 ; ... ; n M , Am) = P(ni, Xi)P(n 2 , A 2 ) . . . P(n M , Am) , (5.22) 

it is necessary to go to multidimensional coherent states. We intend to treat this in 
greater detail in a future publication, but here we briefly indicate the main idea. Con- 
sider again a discrete distribution P(n, A) of the type (13.111) . i.e., such that it has as- 
sociated coherent states of the type (I3.19p . These coherent states \z) are defined on a 
Hilbert space f). Let fj A/ = 5)®$)®...®$) be the M-fold tensor product of S) with 
itself. On S) M we define the vectors, 

\Zi, z 2 , ■ ■ ■ , z M ) = \zi)\z 2 ) . . . \Zm) 

N 



^ [P(ni,Ai; n 2 ,A 2 ; ... ;n M ,X M )}'- 

ni=0, ri2=0, um=0 
i{n 1 9 1 +n 2 92+ ■■■ +n M 6 M ) 



(5.23) 



where the vectors 

® . . . <g> 4>n M , < ni, n 2 , . . . , riM < N , 

form an orthonormal basis for $j M . We call the vectors f 1 5 . 2 3 1) multidimensional coherent 
states. Such coherent states have been studied in different contexts before (see, for 
example, [21]). These vectors are normalized, 

(Zi, Z 2 , . . . , Zm I Zi,Z 2 , . . . , Zm) — 1) 
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and they satisfy the resolution of the identity (compare with (13.20p ). 

M 

z u z 2 ,..., z M )(z 1 , z 2 ,..., z M \ JJ^V(Aj) dg(\i) dOi = I^m , 

i=i 

where T> M = PxDx...xDis the M-fold cartesian product of the domain T> = 
{y/X e ie £ C | A £ [0, L), 9 £ [0, 2tt)} over which the coherent states \z) in (13.1 9ft are 
defined. 

Once again these coherent states appear as "generalized square-roots" of the joint 
probability distribution P(n l7 Ai; n 2 , A 2 ; ... ; n M , Am), < n\, n 2 , . . . um < N, and 
just as in ( 13.231) . 

P(ni,Ai; n 2 ,A 2 ; ... ;n M ,A M ) = Tr [\z x , z 2 , . . . , z M )( 

Zi, Z 2 , . . . , Zm\ Pni, n 2 , n M \ 
ni, n 2 , ... nu I Zl, z 2 ,..., Z M )\ , (5.24) 

with 

Recently (see [9]), this formalism has been applied to the construction of vector 
coherent states for the quantum motion of a particle in an infinite square well, enabling 
one to define in an unambiguous way the momentum operator. The construction in [9] 
is based on Gaussian probability distributions but it can be carried out using a large 
class of distributions. 



VI Conclusion 

As mentioned in the Introduction, the relationship between coherent states and statisti- 
cal distributions has been studied before. We have tried to demonstrate here the deeper 
connection between such distributions, both continuous and discrete, and reproducing 
kernel Hilbert spaces, in so far as the latter are the carriers of generalized coherent 
states. Moreover, taking this point of view, it has been possible to connect vector co- 
herent states to mixtures of probability distributions and multi-dimensional coherent 
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states to joint probability distributions. The posterior distribution, appearing on the 
parameter space of a discrete distribution, is clearly seen to be a consequence of the 
resolution of the identity satisfied by the coherent states. Again this has been noticed 
earlier, but here we are able to put it in a more general context. 

The one intriguing question that arises from the general discussion is the following: 
as has been demonstrated, a discrete statistical distribution, or a family of discretely 
parametrized continuous distributions, satisfying certain technical conditions, lead to 
the existence of coherent states on an associated Hilbert space. These coherent states, 
in turn, can be shown to lead to quantum probabilities, embodied in a positive operator 
valued measure, on the parameter space. The nature of classical (commutative) and 
quantum (non- commutative) probability are intrinsically different, yet it seems to be 
possible to make a smooth transition from one to the other. This is reminiscent of the 
process of quantization, i.e., the passage from a classical mechanical system to its quan- 
tum counterpart, and in particular, coherent state quantization (see, for example, [2] for 
a review of the theory of quantization and [HI [121 [131 El CE] for a series of examples). So 
one might ask the question as to whether the procedure described above could be con- 
sidered as constituting a quantization of the underlying classical probability theory. In 
this connection it would also be interesting to study more closely the duality appearing 
between the discrete and continuous distributions incorporated in the coherent states 
and the analogous duality familiar from Bayesian statistics. 

Appendix: Some elements of Bayesian inference 

In this Appendix we put together some notions from Bayesian statistical inference that 
have been used in this paper. Some relevant references are 0, [191 EOJ, [27] 

Event space background 

The context is the setup and subsequent performance of an experiment where there is 
a random component to the results and where the set U of possible results is known. 
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In the field of statistics, the experiment is called a "random experiment". Events are 
identified with measurable subsets of U. That is, we say that event E has occurred if 
the observed result u ohs is in the subset E. One "experiment", of course, could be an 
amalgam of a whole set of sub-experiments, sometimes called "trials" . 



Conditional probabilities 

Let P(E | B) designate the conditional probability that event E occurs given that event 
B has occurred. Then 

P (E | B) = Pi - E p B > , 

where the numerator stands for the joint probability of occurrence of events E and B 
and the denominator is the unconditional probability of occurrence of event B (to ensure 
normalization). Consider the conditional probability the other way around P(B | E). 

, , . P(E n B) 
P(B | E) = 



P{E) ■ 

Suppose that we do not know the joint probability and in fact we only know the first 
conditional probability P(E | B) and the two unconditional probabilities, then we can 
write / , N , N 

V 1 ; P{E) 

The probability P{B \ E) is called the posterior conditional probability for B given E 
and P(B) is called the prior probability of B. Sometimes we compute several of these 
posterior probabilities in the cases where the set of events {B\, B2, . . . , B n } is a partition 
of U and the events Bi are in the nature of possible causal hypotheses for the subsequent 
occurrence of event E. Suppose that we know the conditional probabilities P(E | Bi) 
and the unconditional (prior) probabilities P{Bi) for each Bi. Then one chooses a likely 
hypothesis by computing each of the posterior probabilities. 
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The case of a continuous family of discrete probability distribu- 
tions 



Consider the performance of a classical experiment in which the outcome has a random 
component within the following context. Let n = 0,1,2, N index the (discrete) set 
of possible outcomes of the experiment, where N is a positive integer or oo. For real 
parameter A G A, let P(n, A) be a family of classical discrete probability distributions 
indexed by A, which serves as a stochastic model for the experiment. We suppose that 
A is unknown and the object of the experiment is to obtain data with which to infer a 
probability distribution on the parameter space A. After performance of the experiment, 
let k indicate the observed outcome. Then construct a conditional probability density 
function / for A, given k, in the form: 

P(k,X)U(X) 
A ' } J A P(k,X')U(X')dX' ' 

where n(A) is an unconditional probability measure on the parameter space A, arbitrary, 
subject to the integrability of the denominator. The measure II is called the prior 
measure on A and the conditional probability density function / is called the density 
function of the posterior probability distribution on A. 

Example : Toss a coin N times observing n, the number of occurrences of heads. Let 
the parameter p be the probability of obtaining heads on one toss. Supposing that p 
is unknown, the object is to use the outcome of the experiment to obtain a probability 
distribution on the parameter space (0, 1). The stochastic model is the binomial family, 

P(n,p)= N \ : p w (l for n = 0,1,2,.. .,N, 
[JS — ny.nl 

where N is a positive integer. After the performance of the experiment, having obtained 
k heads, with choice of prior measure II(p), the posterior distribution on (0, 1) is given 
by the conditional probability density function, 

p k (l-p)( N - k )U(p) 



f(p,k) 



JoV fe (l - p'Y N -^U.(j/) dp 1 
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